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WHAT IS CLAIMED IS: 

1 . A subcollection of samples from a target population, comprising: 
a plurality or samples, wherein the samples are selected from the group 

consisting of bloodAtissue, body fluid, cell, seed, microbe, pathogen and 
5 reproductive tissue samples; and 

a symbology oruthe containers containing the samples, wherein the 

symbology is representative of the source and/or history of each sample, 

wherein: 

the target populatiqp is a healthy population that has not been selected 
10 for any disease state; 

the collection comprises samples from the healthy population; and 
the subcollection is oartsined by sorting the collection according to 
specified parameters. 

2. The subcolldJktion\ofJclaim 1, wherein the parameters are selected 
1 5 from the group consisting cJ&^TfflJiicity, age, gender, height, weight, alcohol 

intake, number of pregnancies, ni\jmber of live births, vegetarians, type of 
physical activity, state of residence and/or length of residence in a particular 
state, educational level, age of parAit at death, cause of parent death, former or 
current smoker, length of time as a amoker, frequency of smoking, occurrence 
20 of a disease in immediate family (parent, siblings, children), use of prescription 
drugs and/or reason therefor, length aqd/or number of hospital stays and 
exposure to environmental factors. 

3. The subcollection of claimXl, wherein the symbology is a bar code. 
^ v 4. A method of producing a database, comprising: 

25 < ^y'V\ identifying heal\fiy members of a population; 

obtaining data comprising identifying information and obtaining historical 
information and data relating to the identified members of the population and 
their immediate family; 

entering the data intfo a database for each member of the population and 
30 associating the member and^the data with an indexer. 

5. The method of alaim 4, further comprising 
obtaining a body tissue or body fluid sample; 
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analyzing the bofty tissue or body fluid in the sample; and 
entering the results of the analysis for each member into the database 
and associating each resuli with the indexer representative of each member. 
6. A database produced by the method of claim 4. 
5 7. A database produced by the method of claim 5. 

8. A database, comprising: 

datapqints representative of a plurality of healthy organisms from 



£j V u a lapsus rtrpreserucmve 

Qjl ^ \ whom biological samples are obtained, 

wherein each datapoint ij 



datapoint is associated with data representative of 
10 the organism type and drther identifying information. 

9. The database of claim 8, wherein the datapoints are answers to 
questions regarding one o\ more of a parameters selected from the group 
consisting of ethnicity, age\ gender, height, weight, alcohol intake, number of 
pregnancies, number of live oirths, vegetarians, type of physical activity, state of 

15 residence and/or length of residence in a particular state, educational level, age 
of parent at death, cause of parent death, former or current smoker, length of 
time as a smoker, frequency o\ smoking, occurrence of a disease in immediate 
family (parent, siblings, children)\ use of prescription drugs and/or reason 
therefor, length and/or number ol\hospital stays and exposure to environmental 

20 factors. 

10. The database of claim 9, wherein the organisms are mammals and 
the samples are body fluids or tissues. 

1 1 . The database of claim 9, wherein the samples are selected from 
blood, blood fractions, cells and subcellular organelles. 
25 12 The database of claim 8, further comprising, 

phenotypic data from an organism. 

13. The database of claim 12, wherein the data includes one of physical 
characteristics, background data, medical data, and historical data. 

14. The database of claim 8, further comprising, 

30 genotypic data from nucleic acid obtained from an organism. 
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15. The database of claim 14, wherein genotypic data includes, 
genetic markers, non-coding regions, microsatellites, RFLPs^VNTRs, historical 
data of the organism, medical history, and phenotypic information. 

16. The database of claim 8 that is a relational database. 

5 17. The database of claim 16, wherein the data are related to an 

indexer datapoint representative of each organism from whom data is obtained. 

18. A method of identifying polymorphisms that are candidate genetic 
markers, comprising\ 

identifying a polymorphism; and 
10 identifying any pathway or gene linked to the locus of the 

polymorphism, wherein \ 

the polymorphisms are identified in samples associated with a target 
population that comprises\healthy subjects. 

19. The method ©f claim 18, wherein the polymorphism is identified by 
1 5 detecting the presence of target nucleic acids in a sample by a method, 

comprising the steps of: \ 

a) hybridizing a first oligonucleotide to the target nucleic acid; 

b) hybridizing a secondoligonucleotide to an adjacent region of the 
target nucleic acid; yS^\ \ 

20 c) ligating tbre hybridized oligonucleotides; and 

c) detecting hybridized first^ligonucleotide by mass spectrometry as 
an indication of the pr^encte of the target nucleic acid. 

20. The method of (claim ^18, wherein the polymorphism is identified by 
detecting target nucleic acids iVi a saimpfe by a method, comprising the steps of: 

25 a) hybridizing a first pttgorrucleotide to the target nucleic acid and 

hybridizing a second oligonucleotide ta an adjacent region of the target nucleic 

acid; \ 

b) contacting the hybridized first and second oligonucleotides with a 

cleavage enzyme to form a cleavage product; and 
30 c) detecting the cleavage product by mass spectrometry as an 

indication of the presence of the target nucleic acid. 
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21. The method of claim 20 wherein the samples are from subjects in 
a healthy database. 

22. The method of claim 18, wherein the polymorphism is identified 
by identifying target nuc\eic acids in a sample by primer oligo base extension 

5 (probe). 

23. The method \>f 22, wherein primer oligo base extension, 
comprises: 

a) obtaining a nucleid acid molecule that contains a target nucleotide; 

b) optionally immobilizing the nucleic acid molecule onto a solid support, 
10 to produce an immobilized nucleic acid molecule; 

c) hybridizing the nucleic acid molecule with a primer oligonucleotide that 
is complementary to the nucleic \c'\d molecule at a site adjacent to the target 
nucleotide; 

d) contacting the product df ste p c) w ith a composition comprising a 
1 5 dideoxynucleoside triphosphate QJ^f^'-deoxynuhleoside triphosphates and a 

polymerase, so that only a^iraeoxynfycleoside oi/3'-deoxynucleoside triphosphate 
that is complementary tcf the target rtucleotidcT is extended onto the primer; and 

e) detecting the (^xtendqd primer, thereby identifying the target 
nucleotide. 

20 24. The method of cla\m 23, ^/herein det^fction of the extended primer 

is effected by mass spectrometry compV 

ionizing and volatizing the p(odu£j\"-t5T step d) ; and 

detecting the extended primeht>y r\iass spectrometry, thereby identifying 
the target nucleotide. 
25 25. The method of claim 24, wherein; 

samples are presented to the mass spectrometer as arrays on chips; and 
each sample occupies a volume that Is about the size of the laser spot 



projected by the laser in a mass spectromete' 
desorption/ionization (MALDI) spectrometry 



used in matrix-assisted laser 
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26. A combination, comprising: 

a database\containing parameters associated with a datapoint 
representative of a^ubject from whom samples are obtained, wherein the 
subjects are healthyXand 
5 an indexed collection of the samples, wherein the index identifies the 

subject from whom tha sample was obtained. 

27 The combination of claim 26, wherein the parameter is selected 
from the group consisting^ of ethnicity, age, gender, height, weight, alcohol 
intake, number of pregnafufjes, number of live births, vegetarians, type of 
10 physical activity, sta^e of residence and/or length of residence in a particular 

state, educational /leveK age of pb<ent at death, cause of parent death, former or 
current smoker, length of time\as a smoker, frequency of smoking, occurrence 
of disease in immediate family (parent/siblings, children), use of prescription 
drugs and/or reason therefot;, I epmn and/or number of hospital stays and 
1 5 ecposure to environmental factor^ 

28. The combination of dJaim 26, wherein the database further 
contains genotypic data for each sutbject. 

29. The combination of claiVi 26, wherein the samples are blood. 
30 A data storage medium, comprising the database of claim 8. 

20 31 . A computer system, comprising the database of claim 8. 

32. A system for high throughput processing of biological samples, 
comprising: 

a process line comprising a plurality of processing stations, each of which 

performs a procedure on a biological sample contained in a 
25 reaction vessel; 

a robotic system that transports the reaction vessel from processing 

station to processing station; 
a data analysis system that receives test results of the process line and 

automatically processes the test results to make a determination 
30 regarding the biological sample in the reaction vessel; 

a control system that determines when the test at each processing 

station is complete and, in response, moves the reaction vessel to 



-121- 



24736-2033 



the next test station, and continuously processes reaction vessels 
one after another until the control system receives a stop 
instruction; and 

a database of claim 8, wherein the samples tested by the automated 
5 process line comprise samples from subjects in the database. 

33. The system of claim 32, wherein one of the processing stations 
comprises a mass spectrometer. 

34. The system of claim 32, wherein the data analysis system 
processes the test results by receiving test data from the mass spectrometer 

10 such that the test data for a biological sample contains one or more signals, 
whereupon the data analysis system determines the area under the curve of 
each signal and normalizes the results thereof and obtains a substantially 
quantitative result representative of the relative amounts of components in the 
tested sample. 

15 35. A methcra for high throughput processing of biological samples, 

the method comprising! 

transporting a reaction vessel along a system of claim 32, comprising a 
process line having a plurality of processing stations, each of 
which performs a procedure on one or more biological samples 
20 contained in the reaction vessel; 

determining when tha test procedure at each processing station is 

complete and, ijjt^espons^, moving the reaction vessel to the next 
processing; 

receiving test/results of the pfocess line and automatically processing the 
25 test results tOiinake a data analysis determination regarding the 

biological sampJes ii\ the ruction vessel; and 
processing reaction vessejs^tfcmtinuously one after another until receiving 
a stop instruction, wherein the samples tested by the automated 
process line comprise samples from subjects in the database. 
30 36. The method of 35, whe\ein one of the processing stations 

comprises a mass spectrometer. 
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37. ThA method of claim 36, wherein the samples are analyzed by a 
method comprising primer oligo base extension (probe). 

38. The method of claim 37, further comprising: 
processing trae test results by receiving test data from the mass 

5 spectrometer such tnat the test data for a biological sample contains one or 
more signals or numewcal values representative of signals, whereupon the data 
analysis system determines the area under the curve of each signal and 
normalizes the results traereof and obtains a substantially quantitative result 
representative of the relative amounts of components in the tested sample. 
10 39. The method or claim 37, wherein primer oligo base extension, 

comprises: \ 

a) obtaining a nuclete acid molecule that contains a target nucleotide; 

b) optionally immobilizing the nucleic acid molecule onto a solid support, 
to produce an immobilized nucleic acid molecule; 

15 c) hybridizing the nucleic acid molecule with a primer oligonucleotide that 

is complementary to the nucleia acid molecule at a site adjacent to the target 
nucleotide; ^ *x 

d) contacting thtfproduct\of step c) with composition comprising a 
dideoxynucleoside /triphosphate or\a >8'-deoxynucleoside triphosphates and a 

20 polymerase, so thai only a\dideoxyraucteoside or 3'-deoxynucleoside triphosphate 
that is complementary to the target nucleotide is extended onto the primer; and 

e) detecting the primeV theretw/fdentifying the target nucleotide. 

40. The method of 39^wfTe)lein detection of the extended primer is 
effected by mass spectrometryVcomprlsing: 

25 ionizing and volatizing the product of step d); and 

detecting the extended primer byunass spectrometry, thereby identifying 
the target nucleotide. \ 

41 . The method of claim 36, wherein the target nucleic acids in the 
sample are detected and/or identified by a method, comprising the steps of: 

30 a) hybridizing a first oligonucleotide to the target nucleic acid; 

b) hybridizing a second oligonucleotide to an adjacent region of the 
target nucleic acid; 
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c) ligatingythen hybridized oligonucleotides; and 

c) detecting hybridized first oligonucleotide by mass spectrometry as 
an indication of the presence of the target nucleic acid. 

42. The method of claim 36, wherein the target nucleic acids in the 
5 sample are detected andWydentified by a method, comprising the steps of: 

a) hybridizing V^first oligonucleotide to the target nucleic acid and 
hybridizing a second fo>igoi\ubJeotide to an adjacent region of the target nucleic 
acid; V \ \J 

b) contactin^kna hybridized first and second oligonucleotides with a 
10 cleavage enzyme to form a cleavage product; and 

c) detecting the cleavage product by mass spectrometry as an 
indication of the presence of trae target nucleic acid. 

43. \a method of producing a database stored in a computer memory, 
comprising: \ 

identifying, healthy members of a population; 

obtaining iaentifying and historical information and data relating to the 
identified members Vf the population; 

entering the mfember-related data into the computer memory database for 
each identified membe\of the population and associating the member and the 
20 data with an indexer. \ 

44. The method of claim 43, further comprising: 

obtaining a body tissue or body fluid sample of an identified member; 
analyzing the body tissue or body fluid in the sample; and 
entering the results of the analysis for each member into the computer 
25 memory database and associating each result with the indexer representative of 
each member. 

45. A database produced by the method of claim 43. 

46. A database produced by the method of claim 44. 

47. The database of claim 8, wherein: 

30 the organims are selected from among animals, bacteria, fungi, 

protozoans and parasites and 
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each datapoint is associated with parameters representative of the 
organism type and identifying information. 

48. The database of claim 43, further comprising, 
phenotypic data regarding each subject. 
5 49. The database of claim 47 that is a relational database and the 

parameters are the answers to the questions in the questionnaire. 

50. The database of claim 8, further comprising, 

genotypic data of nucleic acid of the subject, wherein genotypic data 
includes, but is not limited to, genetic markers, non-coding regions, 
10 microsatellites, restriction fragment length polymorphisms (RFLPs), variable 
number tandem repeats (VNTRs), historical day of the organism, the medical 
history of the subject, phenotypic information, and other information. 

51. A database, comprising data records stored in computer memory, 
wherein the data retrof€ls contain information that identifies healthy members of 

15 a population, and alsc\Wntain identifying and historical information and data 
relating to the identified rnembers. 

52. The databaew^claim 51, further comprising an index value for 
each identified member thatV^ssociates each member of the population with the 
identifying and historical information and data. 

20 £\ 53. A compiler system, comprising the database of claim 51. 

54. An automated process line, comprising the database of claim 51. 

55. A metnbd for determining a polymorphism that correlates with 
age, ethnicity or gender, comprising: 

identifying a polymorphism; and 
25 determining the frequency of the polymorphism with increasing age, with 

ethnicity or with gender inua. healthy population. 

56. A methoqYoXffletermining whether a polymorphism correlates with 
suceptibility to morbimty,\ea^iVNqiortality, or morbidity and early mortality, 
comprising; 

30 identifying a polymofjbtffttem; and 

determining the frequency of the polymorphism with increasing age in a 
healthy population. 
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57. \A high throughput method of determining frequencies of genetic 
variations, comprising: 

selecting^ a healthy target population and a genetic variation to be 
assessed; \ 

5 pooling a plurality of samples of biopolymers obtained from members of 

the population, \ 

determiningW detecting the biopolymer that comprises the variation by 
mass spectrometryA 

obtaining a nrtess spectrum or a digital representation thereof; and 
10 determining me frequency of the variation in the population. 

58. The method of claim 57, wherein: 

the variation is selected from the group consisting of an allelic variation, a 
post-translational modification, a nucleic modification, a label, a mass 
modification of a nUefelS acicl and methylation; and/or 

15 the bioD^ffymor is a nucleic acid, a protein, a polysaccharide, a lipid, a 

small organic/metabolite ot intecmediate, wherein the concentration of 
biopolymer of interest \s the same inSeach of the samples; and/or 

the frequency is aetqrmined^y assessing the method comprising 
determining the area unater tLe^peak in the mass spectrum or digital 

20 repesentation thereof corresponding to the mass of the biopolymer comprising 
the genomic variation. \ 

59. The method of claim 58, wherein the method for determining the 
frequency is effected by determining the ratio of the signal or the digital 
representation thereof to the tota^ area of the entire mass spectrum, which is 

25 corrected for background. \ 

60. A method for discovery of a polymorphism in a population, 
comprising: \ ^ 

sorting the database QfX c I adm 8 according to a selected parameter to 
identify samples that match the sMected parameter; 
30 isolating a biopolymer fronli eaich identified sample- 

optionally pooling each isolatea^Diopolymer; 
optionally amplifying the amounr\of biopolymer; 
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cleaving the pooled biopolymers to produce fragments thereof; 

obtaining a mass spectrum of the resulting fragments and comparing the 
mass spectrum with a control mass spectrum to identify differences between the 
spectra and thereby identifirrg any polymorphisms; wherein: 

the control mass specifum is obtained from unsorted samples in the 
collection or samples sorted according to a different parameter. 

61. The method of clainj 60, wherein cleaving is effected by contacting 
the biopolymer with an enzyr; 



in the enzyme is selected from the 
nickase and a type IIS restriction 



srein the biopolymer is a nucleic acid 



62. The mettKxi of cla/rVi 61 , w 
10 group consisting of nucleotide cjly^osylase, a 

enzyme. 

63. The method of cl^[m\60 
or a protein 

64. The method of claim 6p, wherein the the mass spectrometric 
1 5 format is selected from among MatriA-Assisted Laser Desorption/lonization, 

Time-of-Flight (MALDI-TOF), Electrostfray (ES), IR-MALDI, Ion Cyclotron 
Resonance (ICR), Fourier Transform ana combinations thereof. 

65. A rYiethod for discovery of a polymorphism in a population, 
comprising 

20 obtaining samples of body tissue or fluid from a plurality of organisms; 

isolating a biopolymer from each sample; 
pooling each isolated biopolymer; 
optionally amplSfying^the amount of biopolymer; 
cleaving th^poojed )6iopolymers to produce fragments thereof; 
25 obtaining (a 1qriass\sp£fctrum of the resulting fragments; 

th^fre^ueyncy of each fragment to identify fragments present 
average frequency, thereby identifying any 



obtaining 
comparing 
in amounts lower tha 



polymorphisms. 

66. The method of c\aim 65, wherein cleaving is effected by contacting 
30 the biopolymer with an enzyme. 
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69. Th 
format is sele, 
Time-of-Fligh 
Resonance (ICR), F&uri 




67. \The method of claim 66, wherein the enzyme is selected from the 
group consisting of nucleotide glycosylase, a nickase and a type IIS restriction 
enzyme. 

68. Th 1 ^ method of claim 65, wherein the biopolymer is a nucleic acid 
or a protein. 

d of claim 65, wherein the the mass spectrometric 
mong Matrix-Assisted Laser Desorption/lonization, 
Electrospray (ES), IR-MALDI, Ion Cyclotron 
ansform and combinations thereof. 
10 70. The method of claim 65, wherein the samples are obtained from 

healthy subjects. 

71. A method of correlating a polymorphism with a parameter, 
comprising: \ 

sorting the database of claim 8 according to a selected parameter to 
1 5 identify samples that match the selected parameter; 

isolating a biopoJymer from each identified sample; 
pooling each isolated biopolymer; 
optionally amplifying the amount of biopolymer; 
determining the frequency of the polymorphism in the pooled 
20 biopolymers, wherein: 

an alteration of the frequency of the polymorphism compared to a control, 
indicates a correlation oiAfte bolymorpfil^ri with the selected parameter; and 

the control is ttfe frequency of the polymorphism in pooled biopolymers 
obtained from samples identifjleii from an unjsorted database or from a database 
25 sorting according to a differeht p^ramc 

72. The method clpim 7\1 , wh«*r6in the parameter is selected from the 

age,*gpfider, Height, weight, alcohol intake, 

number of pregnancies, numbered? (\ve births, vegetarians, type of physical 
activity, state of residence and/or ler\gth of residence in a particular state, 
30 educational level, age of parent at deetoh, cause of parent death, former or 

current smoker, length of time as a smoker, frequency of smoking, occurrence 
of a disease in immediate family {parent,\siblings, children), use of prescription 



group consisting of ethnicity 
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drugs and/or reasonvtherefor, length and/or number of hospital stays and 
exposure to environmental factors. 

73. The method claim 72, wherein the parameter is occurrence of 
disease or a particular disease in an immediate family member, thereby 

5 correlating the polymorphism with the disease. 

74. The method of claim 71, wherein the pooled biopolymers are 
pooled nucleic acid molecules. 

75. The method of claim 74, wherein the polymorphism is detected 
by primer oligo base extension (PROBE). 

10 76. The method ol\75, wherein primer oligo base extension, 

comprises: 

a) optionally immobilizing the nucleic acid molecules onto a solid support, 
to produce immobilized nucleic acid molecules; 

b) hybridizing the nucleic acid molecules with a primer oligonucleotide 
1 5 that is complementary to^e^nqcleijr acltKmolecule at a site adjacent to the 

polymorphism; 

c) contacting the product of step c) (with composition comprising a 
dideoxynucleosideXtriphosphater or a 3'-deo&ynucleoside triphosphates and a 
polymerase, so that only a dideoxytnuclepside or 3'-deoxynucleoside triphosphate 

20 that is complementary to the polynVap^hism is extended onto the primer; and 

d) detecting the exter^cKprimer, thereby detecting the polymorphism in 
nucleic acid molecules-in-the pooled \nucleic acids. 

77. The method of claim 7q, wherein detecting is effected by mass 
spectrometry. 

25 78. The method of claim 71, Wherein the frequency is percentage of 

nucleic acid molecules in the pooled nucleic acids that contain the 
polymorphism. 

79. The method of claim 78, Wherein the ratio is determined by 
obtaining mass spectra of the pooled nucleic acids. 
30 80. The method of claim 72, whferein the parameter is age, thereby 

correlating the polymorphism with suceptibwity to morbidity, early mortality or 
morbidity and early mortality. \ 
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81. A method for maplotyping polymorphisms in a nucleic acid, 
comprising: \ 

(a) sorting the database of claim 8 according to a selected parameter 
to identify samples that matchVthe selected parameter; 
5 (b) isolating nucleic acid from each identified sample; 

(c) optionally pooling each isolated nucleic acid; 

(d) amplifying the amount of nucleic acid; 

(e) forming single-stranded nucleic acid and splitting each single- 
strand into a separate reaction vessel; 

10 (f) contacting each single-stranded nucleic acid with an adaptor 

nucleic acid to form an adaptpi^-trofrrolex; 

(g) contacting^fne adaproncomplex wifch a nuclease and a ligase; 

(h) contacting the producta of step (gWwith a mixture that is capable 
of amplifying a ligated adaptor toproduce an extended product; 

15 (i) obtaining a mass spectrum af^ach nucleic acid resulting from step 

(h) and detecting a polymorphism J^y^rapntifying a signal corresponding to the 
extended product; 

(j) repeating steps (f) through (i) utilizing an adaptor nucleic acid able 
to hybridize with another adapter nucleic\acid that hybridizes to a different 
20 sequence on the same strand; whereby 

the polymorphisms are haplotyped fyy detecting more than one extended 
product. 

82. The metho<^f claim 1, wherein the nuclease is Fen-1. 

83. A method for haplotyping polymorphisms in a population, 
25 comprising: \ 

sorting the database of claimJJjaccording to a selected parameter to 
identify samples that match thp^selected parameter; 

isolating a nucleic acio fromleach identified sample; 
pooling each isolated nucleic\aaid; / 
30 optionally amplifying the amounkof nucleic acid; 

contacting the nucleic acid^yyfth at least one enzyme to produce 
fragments thereof; 
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obtaining a mass spectrum of the resulting fragments; whereby: 
the polymorphisms are detected by detecting signals corresponding to the 
polymorphisms; and 

the polymorphisms^ are hafllotypecKby determining from the mass 
spectrum that the polymorphisms We located on the same strand of the nucleic 
acid. 

84. The method of claim $3^wherein the enzyme is a nickase. 

85. The method of claipf^S^-, wherein the nickase is selected from the 
group consisting of NY2A and NYS1 

10 86. A method for detecting methylated nucleotides within a nucleic 

acid sample, comarising; 

splitting a nucleic acid sample into separate reaction vessels; 
contacting ni\cleic acid in one reaction vessel with bisulfite; 
amplifying the\nucleic acid in each reaction vessel; 
15 cleaving the nucleic acids in each reaction vessel to produce fragments 

thereof; 

obtaining a mass Spectrum of the resulting fragments from one reaction 
vessel and another mass Spectrum of the resulting fragements from another 
reaction vessel; whereby: 
20 cytosine methylaJjarrYis detected by identifying a difference in signals 

between the mass sj 

87. The rpeth^d of djaihq 86, wherein: 

the step of aY^plif^jng is\carri£d out in the presence of uracil; and 
the step of cleaving\is effected by a uracil glycosylase. 
25 88. A method fonrdentrfying a biological sample, comprising: 

generating a data set indicative of the composition of the biological 
sample; 

denoising the data set to generate denoised data; 

deleting the baseline from the\denoised data to generate an intermediate 
30 data set; 

defining putative peaks for the biological sample; 
using the putative peaks to generate a residual baseline; 
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removing the residual baseline from the intermediate data set to generate 
a corrected dfcta set; 

locating, responsive to removing the residual baseline, a probable peak in 
the corrected cfeta set; and 

identifying, using the located probable peak, the biological sample; 
wherein tfoe generated biological sample data set comprises data from 
sense strands and antisense strands of assay fragments. 

89. The method according to claim 88, wherein identifying includes 
combining data fronp the sense strands and the antisense strands, and 

10 comparing the data ^gainst expected sense strand and antisense strand values, 
to identify the biological 
sample. 

90. The methfed according to claim 88, wherein identifying includes 
deriving a peak probability for the probable peak, in accordance with whether the 

15 probable peak is from sense strand data or from antisense strand data. 

91. The method Recording to claim 88, wherein identifying includes 
deriving a peak probability Wji^eTTTOtrabtie peak and applying an allelic penalty in 
response to a ratio bdtwe'en « calculated alea under the probable peak and a 
calculated expectedyav^rage J^rea under /(\\ peaks in the data set. 

20 92. A mqfthod\for identifying^ biological sample, comprising: 

generating ckdata \et indjcativ£\pf the composition of the biological 
sample; 

denoising the data s^t to $pn$r£te denoised data; 

deleting the baseline ^rqprl^ie denoised data to generate an intermediate 
25 data set; 

defining putative peaks for thte biological sample; using the 

putative peaks to generate a residual\baseline; 

removing the residual baseline l^rom the intermediate data set to generate 
a corrected data set; 

30 locating, responsive to removing|the residual baseline, a probable peak in 

the corrected data set; and 
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identifying^ using the located probable peak, the biological sample; 
wherein identifying includes deriving a peak probability for the probable 
peak and 

applying anlallelic penalty in response to a ratio between a calculated 
5 area under the proliable peak and a calculated expected average area under al 
peaks in the data sat. 

93. The method according to claim 92, wherein identifying includes 
comparing data frorrAprobable peaks that did not receive an applied allelic 
penalty to determine \heir mass in accordance with oligonucleotide biological 

10 data. 

94. The metrtpd according to claim 92, wherein the allelic penalty is 
not applied to probable ©eaks whose ratio of area under the peak to the 
expected area value is greater than 30%. 

95. A method faf detecting a polymorphism in a nucleic acid, 
15 comprising: 

amplifying a^glon oil the nubJeic acid to produce an amplicon, wherein 
the resulting aprfplicon \comprrees one/or more enzyme restriction sites; 

contacting the anaplicoi\ with k restriction enzyme to produce fragments; 
obtaining a mass ^ectrum of the resulting fragments and analyzing 
20 signals in the mass spectrum by tthe methckJ of claim 88; whereby: 
the polymorphism is\detecied irprn the pattern of the signals. 

96. A subcollectioVi of^fmples from a target population, comprising: 
a plurality of samplesA wherein the samples are selected from the group 

consisting of nucleic acids, fefal tissue, protein samples; and 
25 a symbology on the containers containing the samples, wherein the 

symbology is representative of the source and/or history of each sample, 
wherein: 

the target population is a health^ population that has not been selected 
for any disease state; 
30 the collection comprises samples tfrom the healthy population; and 

the subcollection is obtained by porting the collection according to 
specified parameters. 
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97. The commn^tion of claim 26, wherein the samples are selected 
selected from the grtpcip V ia ( ls i st ' n 9 °f nucleic acids, fetal tissue, protein, tissue, 
body fluid, cell, seed, rpit&obe, pathogen and reproductive tissue samples. 

98. A combination, comprising the database of claim 8 and a mass 
spectrometer. 

99. The combination of claim 98 that is an automated process line for 
analyzing biological samples. 

100. A system for high throughput processing of biological samples, 
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. a database of claim 8, wherein the samples tested by the automated 
process line conrvbrise samples from subjects in the database; and 
a mass spectrometry for analysis of biopolymers in the samples. 
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